Model Selection

Multimodal Audio Understanding

# Multimodal Audio Understanding

Qwen 2 Audio Instruct Dynamic Fp8

Qwen2-Audio is the latest version of the Qwen large audio language model series, capable of receiving various audio signal inputs and performing audio analysis or directly generating text responses based on voice commands.

Transformers English

Mini Ichigo Llama3.2 3B S Instruct

The Ichigo-llama3s series model is a multimodal language model developed by Homebrew Research, natively supporting audio and text input comprehension. Based on the Llama-3 architecture, it is trained using WhisperVQ as an audio file tokenizer, enhancing its audio understanding capabilities.

Text-to-Audio English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase